<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<h2 id="readme">README</h2>
<p>This document describes how to reproduce the estimation results of the paper entitled &quot;Localized knowledge spillovers and patent citations: A distance-based approach&quot; by Yasusada Murata, Ryo Nakajima, Ryosuke Okamoto, and Ryuichi Tamura.</p>
<h2 id="requirements">Requirements</h2>
<p>The following resources are required to run the estimation programs.</p>
<h3 id="hardware-requirements">Hardware requirements</h3>
<ul>
<li>A modern computer with at least 3GB memory, and Intel or AMD processor(s)</li>
</ul>
<h3 id="software-requirements">Software requirements</h3>
<ul>
<li>A modern C++ compiler system that supports Standard Template Library (STL)</li>
<li>The latest version of Intel thread building block (Intel TBB) library</li>
<li>The latest version of GNU Scientific Library (GSL)</li>
<li>The latest version of Boost C++ library</li>
<li>GNU <code>make</code> command</li>
<li>R statistical language (version &gt;= 2.0.0)</li>
<li>The Ruby programming language (version &gt;= 1.8)</li>
</ul>
<h2 id="contents">Contents</h2>
<p>This archive consists of the following directories.</p>
<h3 id="input"><code>input/</code></h3>
<p>The directory contains the input data files other than the files that define the Originating-Citing-Control (OCC) relationships.</p>
<ul>
<li><code>cite7599t.txt</code>: the citation file obtained from http://www.nber.org/patents/Cite75_99.txt.</li>
<li><code>patent7599.txt</code>: the bibliographic information of the U.S. patents that were granted from 1975 to 1999.</li>
<li><code>date_class7599.txt</code>: the collection of the application dates, the grant dates, and the technology classes for the patents that are listed in the <code>patent7599.txt</code>.</li>
<li><code>usinventor_loc.txt</code>: the inventor information file for the &quot;name matched&quot; inventors that are listed in the <code>patent7599.txt</code>.</li>
<li><code>nclass_names_tab.txt</code>: the technology class definition.</li>
<li><code>subcategories.txt</code>: the NBER patent technology category definition, which is obtained from http://www.nber.org/patents/subcategories.txt.</li>
<li><code>class_match.txt</code>: a copy of &quot;class-to-category&quot; map file, which is obtained from http://www.nber.org/patents/class_match.txt.</li>
<li><code>uscities_map.txt</code>: the geographic information file that defines the inventor locations geo-coded with 1990 census populated places, and the state, county, CMSA, division, and region names to which the places belong (as of 1980).</li>
<li><code>state_adj.txt</code> and <code>county_adj.txt</code>: the lists of the states (resp., counties) and their adjacent states (resp., counties).</li>
</ul>
<h3 id="occdata"><code>occdata/</code></h3>
<p>The directory contains the data files that define the OCC relationships. They share the same data structure: the technology classes of the originating patents are in the first column; the grant numbers of the originating patents, citing patents, and control patents are in the second, third and fourth columns, respectively; and the lag of months between the application date for each citing patent and that for the associated control patent is in the fifth column.</p>
<ul>
<li><code>7579_2.txt</code> contains the OCC relationships where the control patents are matched with the citing patents at the 3-digit primary class.</li>
<li><code>7579_6.txt</code> contains the OCC relationships where the control patents are matched with the citing patents at the 6-digit primary class.</li>
<li><code>7579_81.txt</code> contains the OCC relationships that share at least one technology subclass.</li>
<li><code>admset.txt</code> contains the OCC relationships that are used in the sensitivity analysis in Section 4.</li>
</ul>
<h3 id="prog3"><code>prog3/</code></h3>
<p>The directory contains the C++ programs to implement the K-density and matching-rate tests in Section 3. The Borne shell, R and Ruby scripts are used to compile and visualize the test results.</p>
<ul>
<li><code>dist.cpp</code>: the program file that defines the functions to implement the K-density tests.</li>
<li><code>match.cpp</code>: the definition file of the common routines used for the K-density and matching-rate tests.</li>
<li><code>match.hpp</code>: the header file that is included in the other <code>.cpp</code> files.</li>
<li><code>nclasses.hpp</code>: the data file of the technology class IDs.</li>
<li><code>occ.cpp</code>: the program file that generates the OCC relationships. It also culls the “U.S.” patents whose inventors resided in the contiguous U.S. area, and excludes self-citation relationships.</li>
<li><code>occ.hpp</code>: the header file that is included in the <code>occ.cpp</code>.</li>
<li><code>occstat.cpp</code>: the program file that counts the numbers of the OCC relationships that are used to make Table 1.</li>
<li><code>tfk.cpp</code>: the program file that defines the functions to implement the matching-rate tests.</li>
<li><code>tfk_adj.cpp</code>: the program file that implements the matching-rate tests.</li>
<li><code>Makefile</code>: the GNU make file to setup the preconditions to run the programs for the K-density and matching-rate tests.</li>
<li><code>dotfk.R</code>: the R script to investigate the differences in the localized technology classes detected by the K-density and matching-rate tests. <code>dotfk.sh</code> is the batch script for it.</li>
<li><code>gampsi.R</code>: the R script to make the figures and tables. The <code>gampsi.sh</code> is the batch script.</li>
<li><code>gampsi2.R</code>: the R script that provides the same functions as the <code>gampsi.R</code> for the results produced by <code>dotfk.R</code>. The <code>gampsi2.sh</code> is the batch script.</li>
<li><code>sets_dotfk.R</code>: the R script to compile the results produced by the <code>dotfk.R</code>.</li>
<li><code>summary.R</code>: the R script to count the number of localized/dispersed classes. The <code>summary.sh</code> is the batch script.</li>
<li><code>tfktab.R</code>: the R script to generate the matching-rate test results.</li>
</ul>
<p>The output results are in the <code>output3.zip</code>.</p>
<h3 id="prog4"><code>prog4/</code></h3>
<p>The directory contains the programs to execute the sensitivity analysis in Section 4. Some programs are overlapped with those in <code>prog3/</code>. For their full details, see the above description.</p>
<ul>
<li>useclass.txt: the data of the technology classes.</li>
<li>tprob.cpp<code>and</code>tprob.hpp`: the C++ programs to calculate the assignment probabilities for the sensitivity analysis.</li>
<li>aggregate.rb, aggregate5_100.rb: the Ruby scripts to compile the multiple sensitivity analysis results into the single result.</li>
<li>sens_graph.R: the R script to draw Figures 7 and 8.</li>
</ul>
<p>The following shell scripts are used to manage the subdirectories and the parameter values.</p>
<ul>
<li><code>DO24816.sh</code> and <code>MR24816.sh</code></li>
<li><code>DO5_100.sh</code> and <code>MR5_100.sh</code></li>
</ul>
<p>The output results are in <code>output4.zip</code>.</p>
<h2 id="section-3">Section 3</h2>
<h3 id="the-directory-structure">The directory structure</h3>
<p>We define the top directory as <code>TOPDIR</code> in which the contents locate. The programs assume the following subdirectory structure, which <em>must be set manually</em> using the <code>mkdir</code> command:</p>
<pre><code>TOPDIR +
       `- output +
             |- kdensity/med21
         |- kdensity/min21
         |- kdensity/med811
         |- kdensity/min811
         |- matchingrate/med21
         |- matchingrate/min21
         |- matchingrate/med811
         |- matchingrate/min811</code></pre>
<h3 id="compiling-programs">Compiling programs</h3>
<p>To generate the matching-rate test program, type:</p>
<pre><code>$ make -DTFK</code></pre>
<p>To generate the K-density test program, type:</p>
<pre><code>$ make</code></pre>
<p>in <code>prog3</code> directory (Note that <code>$</code> stands for the command prompt).</p>
<p>Change the name of the generated binary program from <code>CITEDIST</code> to <code>DO</code> to do the sensitivity analysis for the K-density test, and to <code>MR</code> for the matching-rate test.</p>
<h3 id="section-3.1-table-2.">Section 3.1, Table 2.</h3>
<p>In <code>prog3</code> directory, type</p>
<pre><code>$ ./matchingrate ../occdata/7579_2.txt 1 2 1 share_one_match ../output/matchingrate/min211.txt
$ ./matchingrate ../occdata/7579_2.txt 1 2 1 majority_match ../output/matchingrate/med211.txt
$ ./matchingrate ../occdata/7579_81.txt 1 2 1 share_one_match ../output/matchingrate/min811.txt
$ ./matchingrate ../occdata/7579_81.txt 1 2 1 majority_match ../output/matchingrate/med811.txt</code></pre>
<p>The outputs are the matching rates at the aggregate level for all technology classes. To obtain the results in Table 2, run <code>tfktab.R</code>. For the details of the command line options, refer to <code>tfk.cpp</code>.</p>
<h3 id="section-3.2-table-3.">Section 3.2, Table 3.</h3>
<p>In <code>prog3</code> directory, type:</p>
<pre><code>$ ./kdensity ../occdata/7579_2.txt 1 2 min21
$ ./kdensity ../occdata/7579_2.txt 1 2 med21
$ ./kdensity ../occdata/7579_81.txt 1 2 min81
$ ./kdensity ../occdata/7579_81.txt 1 2 med81</code></pre>
<p>The results of the K-density estimations are produced in the subdirectory given at the end of the command line.</p>
<p>To obtain the numbers shown in Table 3, run the following R script:</p>
<pre><code>$ R --no-save &lt; summary.R [dir]</code></pre>
<p>where <code>[dir]</code> is one of the directories stated above.</p>
<h3 id="section-3.3-figures-1-2-and-3.">Section 3.3, Figures 1, 2, and 3.</h3>
<p>All figures are generated by typing</p>
<pre><code>$ R --no-save &lt; gampsi.R [dir]</code></pre>
<p>where<code>[dir]</code> is one of the directories shown above. This script produces the following files under <code>TOPDIR/nclass/[dir]</code> directory:</p>
<ul>
<li><code>[dir]ggam_cnt.txt</code>: the frequency data to generate Figure 1.</li>
<li><code>[dir]cumdisttab.txt</code>: the cumulative frequency data to generate Figure 2.</li>
<li><code>[dir]gamtab.txt</code> and <code>[dir]psitab.txt</code>: the values of the localization and dispersion indices for each technology class that are used to draw the four bar plots in Figure 3.</li>
</ul>
<h3 id="section-3.4-tables-4-and-5.">Section 3.4, Tables 4 and 5.</h3>
<p>To perform the matching-rate tests shown in Table 4, type:</p>
<pre><code>$ ./matchingrate ../occdata/7579_81.txt 1 2 1 share_one_match dummy min811
$ ./matchingrate ../occdata/7579_81.txt 1 2 1 majority_match dummy med811</code></pre>
<p>The result files are generated for each technology class under the subdirectory specified at the end of the command line.</p>
<p>To obtain the results in Table 5, type</p>
<pre><code>$ ruby sets_dotfk.rb [dir]</code></pre>
<p>where <code>[dir]</code> is one of the directories specified above</p>
<h3 id="section-3.4-figures-4-and-5.">Section 3.4, Figures 4 and 5.</h3>
<p>To obtain the frequency distribution shown in Figure 4, type:</p>
<pre><code>$ ./R --no-save &lt; gampsi2.R [dir] [set] [geographical aggregation level]</code></pre>
<p>where <code>[dir]</code> is one of the directories specified above; <code>[set]</code> is either one of <code>do01tfk0</code> or <code>do01tfk1</code>. The former is for the collection of the technology classes that are localized in the K-density test but not localized in the matching-rate test, while the latter is used for the collection of the technology classes that are localized in the matching-rate test but not in the K-density test; and <code>[geographical aggregation level]</code> is either one of <code>county</code>, <code>cmsa</code>, or <code>state</code>.</p>
<p>The command produces the output files named <code>[dir]_[set]_[geographical aggregation level]ggam_cnt.txt</code> under the <code>TOPDIR/nclass/[dir]_[set]_[geographical aggregation level]</code> directory.</p>
<p>To implement the neighboring region augmented tests presented in Figure 5, recompile the binary program <code>matchingrate</code> with the <code>tfk_adj.cpp</code>. Then type:</p>
<pre><code>$ ./R --no-save &lt; gampsi2.R med81 do01tfk0 state
$ ./R --no-save &lt; gampsi2.R med81 do01tfk0 county</code></pre>
<p>The results are used to produce Figures 4 and 5 in Section 3.4.</p>
<h2 id="section-4.">Section 4.</h2>
<h3 id="the-batch-files">The batch files</h3>
<p>The sensitivity analysis repeats the K-density and matching-rate tests using the different sensitivity parameters for 10(<span class="math"><em>ϕ</em></span>s) x 4(<span class="math">Λ</span>s) x 6(cases) times. The batch files <code>DO24816.sh</code>, <code>MR24816.sh</code>, <code>DO5_100.sh</code> and <code>MR5_100.sh</code> automatically do this and generate the subdirectories in which outputs are saved. The names of the programs are &quot;<code>DO</code>&quot; for the K-density tests and &quot;<code>MR</code>&quot; for the matching-rate tests.</p>
<h3 id="compiling-programs-1">Compiling Programs</h3>
<p>To generate the matching-rate test program, type in <code>prog4</code> directory:</p>
<pre><code>$ make -DTFK</code></pre>
<p>To generate the K-density test program, type:</p>
<pre><code>$ make</code></pre>
<p>Change the name of the generated binary program from <code>CITEDIST</code> to <code>DO</code> to do the sensitivity analysis for the K-density tests, and to <code>MR</code> for the matching-rate tests.</p>
<h3 id="section-4-figures-7-and-8.">Section 4, Figures 7 and 8.</h3>
<p>To obtain the results in Figures 7 and 8, run the following shell scripts in <code>prog4</code> directory:</p>
<pre><code>$ sh DO24816.sh
$ sh MR24816.sh</code></pre>
<p>These programs perform the sensitivity analysis for each <span class="math"><em>ϕ</em></span> at a given level of <span class="math">Γ</span> (=2,4,8,16) and the results are saved under the corresponding subdirectories. To compile and visualize the results, do the following commands:</p>
<pre><code>$ sh DO24816.sh
$ ruby aggregate.rb; R --no-save &lt; sens_graph.R
$ sh MR24816.sh
$ ruby aggregate.rb; R --no-save &lt; sens_graph.R</code></pre>
<h3 id="section-4-figure-9.">Section 4, Figure 9.</h3>
<p>To obtain the results shown in Figure 9, type the following commands in <code>prog4</code> directory:</p>
<pre><code>$ sh DO5_100.sh
$ ruby aggregate5_100.rb
$ sh MR5_100.sh
$ ruby aggregate5_100.rb</code></pre>
<p>These perform the sensitivity analysis for each <span class="math"><em>ϕ</em></span> at a given level of <span class="math">Γ</span>(=5,10,...,75,80,100).</p>
<p>To collect the sensitivity results for cases 2 and 6, type</p>
<pre><code>$ R --no-save &lt; sens5_100.R</code></pre>
<p>The output results are used to draw Figure 9.</p>
</body>
</html>
